Job Title: Spark Developer / Engineer (2 positions)
Location: US Remote, work during PST time zone
Duration: 6-12 Months
Workflows are powered by offline batch jobs written in Scalding, a MapReduce-based framework. To enhance scalability and performance, migrating these jobs from Scalding to Apache Spark.
Key Responsibilities:
Understanding the Existing Scalding Codebase
o Analyze the current Scalding-based data pipelines.
o Document existing business logic and transformations.
Migrating the Logic to Spark
o Convert existing Scalding jobs into Spark (PySpark/Scala) while ensuring optimized performance.
o Refactor data transformations and aggregations in Spark.
o Optimize Spark jobs for efficiency and scalability.
Ensuring Data Parity & Validation
o Develop data parity tests to compare outputs between Scalding and Spark implementations.
o Identify and resolve any discrepancies between the two versions.
o Work with stakeholders to validate correctness.
Writing Unit Tests & Improving Code Quality
o Implement robust unit and integration tests for Spark jobs.
o Ensure code meets engineering best practices (modular, reusable, and well-documented).
Required Qualifications:
Job Type: Contract
Pay: $49.48 - $65.00 per hour
Expected hours: 40 per week
Schedule:
Work Location: Remote